lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

Branch delays.html (3575B)


      1 <?xml version="1.0" encoding="UTF-8"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      3 <html><head><link rel="stylesheet" href="sitewide.css" /><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 6.13.1 (455785)"/><meta name="altitude" content="-0.2646212577819824"/><meta name="author" content="Alex Balgavy"/><meta name="created" content="2017-12-18 12:58:58 PM +0000"/><meta name="latitude" content="52.37360486727292"/><meta name="longitude" content="4.8363388956229"/><meta name="source" content="desktop.mac"/><meta name="updated" content="2017-12-18 1:41:48 PM +0000"/><title>Branch delays</title></head><body><div><b>Unconditional branches</b></div><div>the two instructions that are fetched during decode and compute of first instruction (a branch) have to be discarded</div><div>this two cycle-delay — “branch penalty”</div><div><br/></div><div><img src="Branch%20delays.resources/screenshot_1.png" height="492" width="739"/></div><div><br/></div><div>to reduce the penalty, branch target address must be computed earlier than pipeline — in the decode stage</div><div>this reduces the penalty to one cycle:</div><div><br/></div><div><img src="Branch%20delays.resources/screenshot_2.png" height="397" width="678"/></div><div><br/></div><div>this needs hardware modification — PC has to be incremented in every cycle, and a second adder is needed in decode stage to compute branch target address for every instruction</div><div><br/></div><div><b>Conditional branches</b></div><div>branch condition must be tested as early as possible</div><div>comparator to test condition can be moved to decode stage</div><div>it would use values from register file outputs A and B directly</div><div><br/></div><div><b>Branch delay slot — compiler reorganises instructions</b></div><div>branch delays slot — the location that follows a branch instruction</div><div>compiler tries to find an instruction that it always executed, independent of whether or not the program branches</div><div>data dependencies must be preserved</div><div>if the compiler can find a useful instruction, there’s no branch penalty</div><div>otherwise, it NOPs out and there’s a penalty of one cycle</div><div><br/></div><div><img src="Branch%20delays.resources/screenshot.png" height="586" width="490"/></div><div><br/></div><div><br/></div><div><b>Branch prediction</b></div><div>Static:</div><div><ul><li>assume branch will not be taken, fetch next instruction in sequential order<br/></li><li>simple, decent accuracy</li><li>processor can determine static prediction by checking sign of branch offset</li><li>other option — encoding of branch instruction can include a bit indicating whether prediction should be ‘take’ or ‘not taken’ (set by compiler)</li></ul><div><br/></div></div><div>Dynamic:</div><div><ul><li>use actual branch behaviour</li><li>processor assumes that next time an instruction is executed, branch decision is likely to be the same as last time</li><li>keep more information about execution history, such as four states (strongly taken, likely taken, likely not taken, strongly not taken)</li><li>keep a branch target buffer</li><ul><li>identifies branch instructions by their addresses</li><li>in the form of lookup table</li><li>contains: address of branch instruction, one/two state bits for branch prediction algorithm (outcome), branch target address</li></ul></ul></div><div><br/></div></body></html>